This document serves as a demonstration of the capabilities of DGSA package. DGSA stands for “Distance Based Generalized Sensitivity Analysis”, and it is a method for computing parameter sensitivities of computer experiments with complex intputs and complex outputs. Originally the method was developed by Fenwick et al 2012 (reference).
DGSA package implements the original method and expands the original visualization developments by employing advanced graphics provided by “corrplot” and “ggplot2” packages.
The first step in dgsa is to cluster the outputs in some way. Responses can be functional, vectors, scalars, or even mixed functions and scalars. The most adequate clustering method depends on the data, therefore it is left to the user to decide which clustering method to chose in data pre-processing stage. The only cluster related input necesarry for dgsa are the cluster codes associated with each data entry or design point.
In the example given below we used simple principal component analysis with kmeans clustering.
library(DGSA)
INPUT <- read.csv("../data/Input.csv")
OUTPUT.damage <- read.csv("../data/Output_damage.csv")
comps <- prcomp(OUTPUT.damage)
scores <- t(t(comps$x[,1:2])/comps$sdev[1:2])
clustering = kmeans(scores, 2, 50)$cluster
plot(scores)
points(scores[clustering == 1,], col="red", pch = 19)
points(scores[clustering == 2,], col="blue", pch=19)
After clustering, user can either proceed straight to computations, or perform some exploratory data analysis of the cdf’s in order to make an educated decision on how many bins to use for computation of sensitivities of interactions or perhaps even remove some parameters. The package provides convenient tools for such exploratory data analyses. The first function that we will introduce is called “plotCDFS”, which as the name suggests plots parameter cdfs in the same form as they are used in the dgsa code. The code “all*" specifies that we are interested in cdfs of all available input parameters.
plotCDFS(clustering, INPUT, .code = "all*")
All CDFs
Similarly, by changing the “.code” parameter we can also plot cdfs of a single parameter, or even CDFs of interactions. The following two examples demonstrate such capabilities.
plotCDFS(clustering, INPUT, .code = "beta")
plotCDFS(clustering, INPUT, .code = "lambda")
CDFs of single parameters
plotCDFS(clustering, INPUT, .code = "beta|lambda", .nBins = 2)
CDFs of interaction for 2 bins
plotCDFS(clustering, INPUT, .code = "beta|lambda", .nBins = 3)
CDFs of interaction for 3 bins
To compute parameter sensitivities with DGSA one should use the following function.
myDGSA <- dgsa(clustering, INPUT, .interactions = TRUE, .nBoot = 100, .nBins = 3, .alpha = 0.95, .parallel = FALSE)
Note the controls that are available to users, such as the number of bootstrapped samples (.nBoot), number of bins for interactions (.nBins), significance test factor (.alpha), boolean operator (.interactions) specifying whether to compute interactions or not. In the current version parameter (.parallel) is disabled but it is left as an option for future implementation to speed up bootstrapping through parallelization.
This function returns a list with two components. The first component is a 3D matrix of size (nbClusters X nbParameters X nbParameters). Diagnoal elements of this matrix are the main effects, while off-diagonal elements are sensitivities of interactions. All elements are automatically normalized/sandardized by .alpha value from the boostrapped distributions. If users want to change normalization factor the function given above has to be executed one more time, in other words the code does not save bootstrapped samples like the original MATLAB code on github.
To aid better visualization of the main effects and interactions we make use of the “corrplot” package which was originally developed for visualizations of covariance matrices. Without much effort we can gain significant insights into parameter interactions, perform parameter rankings through matrix rearrangement, and finally summarize the entire sensitivity analysis in one plot instead of using several paretto plots. The following wrapper function was developed for efficient communication with “corrplot” functions.
plotMatrixDGSA(.dgsa, .hypothesis = TRUE, .method = "circle", ...)
Input parameter “.dgsa” is an object returned from the “dgsa” function (in our case “myDGSA”). The code will check if the object really came out of “myDGSA” function. The second parameter “.hypothesis” specifies whether to mark “non-sensitive” parameters on the correlation plot. User has a plethora of options that enable advanced markup of non-significant sensitivities (more on that later). The third parmeter is “.method” which specifies the plotting method passed to “corrplot” function. The choice of the method is limited by the “corrplot” capabilities. Finally, “…” parameters are all passed without change to “corrplot”. Therefore, interested users should consult “corrplot” documentation (help(corrplot)) to learn more about advanced plotting capabilities.
plotMatrixDGSA(myDGSA, .hypothesis = FALSE)
Main/Interaction plot without hypothesis testing
plotMatrixDGSA(myDGSA, .hypothesis = TRUE)
Main/Interaction plot with hypothesis testing
One of the best features of corrplot package is matrix reordering. Many mehtods have been implemented, mainly for covariance matrix reordering, however the one that was found to work best in DGSA setting is “hiearachical clustering” with a “single” linkage. Results of this approach are given below. Notice that the parameters were ranked based on both their main effects and interactions. Particularly interesting part is the lower right corner which highlights that about 7 parameters pop up as the most important.
plotMatrixDGSA(myDGSA, order = 'hclust', hclust.method='single', .hypothesis = FALSE)
Main/Interaction plot with matrix reordering (hierarchical clustering method)
If we turn “.hypothesis” parameter to TRUE the code will add “significance” marks to the matrix sensitivity plot to indicate which sensitivities were found to be insignificant in hypothesis testing. The following expample demonstrates such capability.
plotMatrixDGSA(myDGSA, order = 'hclust', hclust.method='single', .hypothesis = TRUE)
Main/Interaction plot with matrix reordering (hierarchical clustering method) + significance
A few more additional plots produced with advanced “corrplot” settings.
plotMatrixDGSA(myDGSA, order = 'hclust', hclust.method='single', tl.srt = 65)
Main/Interaction plot with parameter reordering (hierarchical clustering method) with rotated text
plotMatrixDGSA(myDGSA, .method = "number", .hypothesis = FALSE)
Main/Interaction plot “number”
plotMatrixDGSA(myDGSA, .method = "square", .hypothesis = FALSE)
Main/Interaction plot “square”
plotMatrixDGSA(myDGSA, .method = "shade", .hypothesis = FALSE)
Main/Interaction plot “shade”
plotMatrixDGSA(myDGSA, .method = "pie", .hypothesis = FALSE)
Main/Interaction plot “pie”
plotMatrixDGSA(myDGSA, .method = "pie", type = "lower", .hypothesis = FALSE)
Main/Interaction plot “pie” just lower diagonal part
plotMatrixDGSA(myDGSA, .method = "circle", diag=TRUE, .hypothesis = TRUE, tl.srt = 45,
insig = "pch", pch = "+", pch.col = "black", pch.cex = 1.5)```
DGSA package also has pareto plotting capabilities just like the original MATLAB code. For this feature we are using advanced graphics by “ggplot2” package. All wrapper functions also provide capability of returning an “ggplot” object for fine tunning of fonts/color via inline methods such as “g + ggtitle(‘mytitle’)”. etc. Users should consult ggplot2 documentation for more information.
plotParetoDGSA(myDGSA)
Pareto plot of main effects (ranked by mean)
plotParetoDGSA(myDGSA, .interaction = "beta")
Pareto plot of parameter sensitivity given parameter beta (interactions). Note “beta|beta” is just its main effect
plotParetoDGSA(myDGSA, .interaction = "lambda")
Pareto plot of parameter sensitivity given parameter lambda (interactions). Note “lambda|lambda” is just its main effect